CodeBoarding

Initializing diagram...

Manages the parsing, curation, and generation of synonyms for various ontologies, supporting external knowledge integration.

Components

Ontology Parsing & Core Logic

This central component encompasses the abstract base class for all ontology parsers and its concrete implementations. It defines the fundamental process of converting raw ontology data into a structured format, resolving linking candidates, managing data persistence, and orchestrating the overall ontology preprocessing pipeline, including interactions with other components for curation, synonym generation, and database population.

Referenced Source Code

Ontology Resource Management

This component handles the loading, dumping, processing, and conflict resolution of ontology string resources. It includes functionalities for analyzing various types of conflicts (e.g., case, normalization), applying autofixes, merging resource sets, and generating comprehensive reports on the state and integrity of these resources.

Ontology Data Acquisition

This component is responsible for downloading ontology data from various external sources. It includes different downloader implementations tailored for specific ontology formats and provides mechanisms for managing proxy settings and caching downloaded files.

Synonym Generation Strategies

This component provides a suite of strategies for generating additional synonyms from existing ontology terms. These strategies range from simple string replacements and combinatorial expansions to more complex token-based and verb phrase variant generation, often leveraging NLP models. It also includes automated curation actions that can modify or generate synonyms.

KRT Ontology Tools

This component represents the functionalities within the Kazu Resource Tool (KRT) that are dedicated to managing and updating ontologies. It includes user interface components for displaying update forms, managing downloader configurations, and utilities for generating and analyzing ontology upgrade reports, as well as managing resource conflicts within the KRT environment.

In-Memory Data Stores

This component provides efficient in-memory databases for storing and retrieving ontology metadata and synonyms. The MetadataDatabase holds descriptive information about ontology IDs, while the SynonymDatabase stores normalized synonyms and their associated linking candidates, enabling fast lookups during NLP processing.

Core Ontology Data Models

This component defines the fundamental data structures that represent various aspects of ontology information within the KAZU system. These models encapsulate concepts such as curated string resources, potential linking candidates, sets of equivalent IDs, global parser actions, and reports on ontology upgrades.

Referenced Source Code

Text Processing Utilities

This component provides a collection of utility functions essential for text manipulation and processing within the KAZU system. This includes string normalization, classification of symbolic strings, general path handling, conversion between data models, grouping functionalities, and string similarity scoring, often leveraging external NLP libraries like spaCy.

Referenced Source Code

Initializing diagram...

Manages the parsing, curation, and generation of synonyms for various ontologies, supporting external knowledge integration.